MoBIoS Index: Support Distance-Based Queries in Bioinformatics

نویسندگان

  • Rui Mao
  • Weijia Xu
  • Willard S. Willard
  • Smriti R. Ramakrishnan
  • Daniel P. Miranker
چکیده

Given a database, answering a distance-based query means retrieving all the data objects in the database that are in close proximity to a query object. Proximity is defined by any metric distance function. Close can mean within a certain distance, a range query, k-nearest neighbor or the two in combination. Answering distance-based queries is a fundamental component of many biological applications, including sequence homology, protein identification by spectral database look-up and biomedical image retrieval. Most systems for retrieving similar biological data objects are domain-specific. For each new model of similarity the retrieval problem must be revisited. (Metric) distance-based indexing only requires that the data can be abstracted into metric space, enabling the reuse of the same software package for many problems. The Molecular Biological Information System (MoBIoS) is a metric-space DBMS targeting bioinformatics applications. In this paper we describe the programmatic interface for MoBIoS index methods. In addition to built-in metrics, the interfaces enable users to integrate new metrics. The system supports four different retrieval methods. We characterize these methods and their applicability to different problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MoBIoS: A Metric-Space DBMS to Support Biological Discovery

MoBIoS is a specialized database management system whose storage manager is based on metric-space indexing, and whose query language entails biological data types. When relational database management systems are used to support biological data, important data types are relegated to blob and unstructured text fields. Consequently, even simple, but critical queries are executed by sequentially du...

متن کامل

Using MoBIoS' scalable genome join to find conserved primer pair candidates between two genomes

MOTIVATION For the purpose of identifying evolutionary reticulation events in flowering plants, we determine a large number of paired, conserved DNA oligomers that may be used as primers to amplify orthologous DNA regions using the polymerase chain reaction (PCR). RESULTS We develop an initial candidate set by comparing the Arabidopsis and rice genomes using MoBIoS (Molecular Biological Infor...

متن کامل

Biosequence Use Cases in MoBIoS SQL

The sequencing and annotation of entire genomes has enriched the content of biological sequence databases such that new methods of sequence analysis, comparison and retrieval are being invented and rerun on an increasingly regular basis, generating new and more complete biological information. Examples include full genome comparisons and phylogenetic footprinting. Simple identification of homol...

متن کامل

Special Issue on Querying Biological Sequences

The sequencing and annotation of entire genomes has enriched the content of biological sequence databases such that new methods of sequence analysis, comparison and retrieval are being invented and rerun on an increasingly regular basis, generating new and more complete biological information. Examples include full genome comparisons and phylogenetic footprinting. Simple identification of homol...

متن کامل

Clustering Sequences in a Metric Space The MoBIoS Project

We are developing a [Molecular] Biological Information System (MoBIoS) based on metric space indices. Unfortunately, common similarity measures for sequence alignment do not form a metric-distance function. This is particularly vexing since the usual definition of edit distance does form a metric. Most clearly, the use of PAM log-odds matrices [2] yields higher similarity scores for more closel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006